Stopping criteria for, and strong convergence of, stochastic gradient descent on Bottou-Curtis-Nocedal functions

نویسندگان

چکیده

Stopping criteria for Stochastic Gradient Descent (SGD) methods play important roles from enabling adaptive step size schemes to providing rigor downstream analyses such as asymptotic inference. Unfortunately, current stopping SGD are often heuristics that rely on normality results or convergence stationary distributions, which may fail exist nonconvex functions and, thereby, limit the applicability of criteria. To address this issue, in work, we rigorously develop two can be applied a broad class functions, term Bottou-Curtis-Nocedal functions. Moreover, prerequisite developing these criteria, prove gradient function evaluated at SGD’s iterates converges strongly zero addresses an open question literature. As result our developed used new bolster other

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergence of Stochastic Gradient Descent for PCA

We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the ...

متن کامل

Convergence Analysis of Gradient Descent Stochastic Algorithms

This paper proves convergence of a sample-path based stochastic gradient-descent algorithm for optimizing expected-value performance measures in discrete event systems. The algorithm uses increasing precision at successive iterations, and it moves against the direction of a generalized gradient of the computed sample performance function. Two convergence results are established: one, for the ca...

متن کامل

Quantized Stochastic Gradient Descent: Communication versus Convergence

Parallel implementations of stochastic gradient descent (SGD) have received signif1 icant research attention, thanks to excellent scalability properties of this algorithm, 2 and to its efficiency in the context of training deep neural networks. A fundamental 3 barrier for parallelizing large-scale SGD is the fact that the cost of communicat4 ing the gradient updates between nodes can be very la...

متن کامل

Convergence diagnostics for stochastic gradient descent with constant step size

Iterative procedures in stochastic optimization are typically comprised of a transient phase and a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in a convergence region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transiti...

متن کامل

On Early Stopping in Gradient Descent Learning

In this paper, we study a family of gradient descent algorithms to approximate the regression function from Reproducing Kernel Hilbert Spaces (RKHSs), the family being characterized by a polynomial decreasing rate of step sizes (or learning rate). By solving a bias-variance trade-off we obtain an early stopping rule and some probabilistic upper bounds for the convergence of the algorithms. Thes...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematical Programming

سال: 2021

ISSN: ['0025-5610', '1436-4646']

DOI: https://doi.org/10.1007/s10107-021-01710-6